Using collocations from comparable corpora to find translation equivalents
نویسندگان
چکیده
In this paper we present a tool for finding appropriate translation equivalents for words from the general lexicon using comparable corpora. For a phrase in the source language the tool suggests a range of possible expressions used in similar contexts in target language corpora. In the paper we discuss the method and present results of human evaluation of the performance of the tool.
منابع مشابه
Extracting collocations and their translations from parallel corpora
Identifying collocations in a text (e.g., break record) and correctly translating them (battre record vs. *casser record) represent key issues in machine translation, notably because of their prevalence in language and their syntactic flexibility. This article describes a method for discovering translation equivalents for collocations from parallel corpora, aimed at increasing the lexical cover...
متن کاملUsing bilingual word-embeddings for multilingual collocation extraction
This paper presents a new strategy for multilingual collocation extraction which takes advantage of parallel corpora to learn bilingual word-embeddings. Monolingual collocation candidates are retrieved using Universal Dependencies, while the distributional models are then applied to search for equivalents of the elements of each collocation in the target languages. The proposed method extracts ...
متن کاملHarnessing the lawless: using comparable corpora to find translation equivalents
Bilingual dictionaries provide basic translation equivalents for a headword and typically limit the set of equivalents to words of the same part of speech as the headword. However, words taken in their contexts can be translated in many more ways. At the same time, equivalents listed in dictionaries are not adequate in many contexts, because of the contextual and collocational sensitivity of ta...
متن کاملAdapted Seed Lexicon and Combined Bidirectional Similarity Measures for Translation Equivalent Extraction from Comparable Corpora
An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexiconwhich is used to bridge contexts in different languagesis adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by ...
متن کاملCollocation translation based on sentence alignment and parsing
To date, substantial efforts have been devoted to the extraction of collocations from text corpora. However, only a few works deal with the subsequent processing of results in order for these to be successfully integrated into the NLP applications that could benefit from them (e.g., machine translation). This paper presents an accurate method for identifying translation equivalents of collocati...
متن کامل